Mon. Not. R. Astron. Soc. 000, 1-?? (2009) Printed 26 February 2010 (MN WI^ style file v2.2) 



The spatial distribution of cold gas in hierarchical galaxy 
formation models 



o 

(N 

CD 

(N 



Han-Seek Kim^\ CM. Baugh^, A.J. Benson^, S. Cole\ C.S. Frenk^C.G. Lacey\ 
C. Power ^, M. Schneider^ 

^ Institute for Computational Cosmology, Department of Physics, University of Durham, South Road, Durham DHl 3LE, UK 
'^Theoretical Astrophysics, Caltech, MC350-17, 1200 E. California Blvd., Pasadena CA 91125, USA 
'^Department of Physics and Astronomy, University of Leicester, Leicester LEI 7RH, UK 



o 

u 

6 



> 

oo 

o 

O 
p 

rn 
O 

o 



ABSTRACT 

The distribution of cold gas in dark matter haloes is driven by key processes in galaxy 
formation: gas cooling, galaxy mergers, star formation and reheating of gas by super- 
novae. We compare the predictions of four different galaxy formation models for the 
spatial distribution of cold gas. We find that satellite galaxies make little contribution 
to the abundance or clustering strength of cold gas selected samples, and are far less 
important than they are in optically selected samples. The halo occupation distribu- 
tion function of present-day central galaxies with cold gas mass > 10^ Mq is peaked 
around a halo mass of « 10^^h~^MQ, a scale that is set by the AGN suppression of 
gas cooling. The model predictions for the projected correlation function are in good 
agreement with measurements from the HI Parkes All-Sky Survey. We compare the 
effective volume of possible surveys with the Square Kilometre Arrayf with those ex- 
pected for a redshift survey in the near-infrared. Future redshift surveys using neutral 
hydrogen emission will be competitive with the most ambitious spectroscopic surveys 
planned in the near-infrared. 
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1 INTRODUCTION 

Cold gas is central to galaxy formation yet little is known 
about how much there is in the Universe at different epochs 
and how this gas is distributed in dark matter haloes of dif- 
ferent mass. The primary probe of atomic hydrogen, 21cm 
line emission, is incredibly weak. It is only in recent years 
that a robust and comprehensive census of atomic hydrogen 
(HI) in the local universe has been made possible through 
the HI Parkes All Sky Survey (Barnes et al. 2001; Zwaan 
et al. 2003, 2005). This work is being extended to lower mass 
systems by the ALFALFA survey (Giovanelli et al. 2007). 
Despite this progress, the highest redshift direct detection 
of HI in emission is very firmly confined to the local Universe 
at z ^ 0.34 (Lah et al. 2009, see also Verheijen et al. 2007). 
Information about cold gas in the high redshift Universe is 
restricted to absorption lines in quasar spectra (e.g. Peroux 
et al. 2003). However, over the coming decade, this situation 
is expected to change dramatically with the construction of 
new, more sensitive radio telescopes such as the pathfind- 
ers for the Square Kilometre Array, MeerKAT (Booth et al. 
2009) and ASKAP (Johnston et al. 2008), and the Square 
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Kilometre Array itself (Schilizzi, Dewdney & Lazio 2008). 
The SKA will revolutionise our understanding of galaxy for- 
mation and cosmology, uncovering the HI Universe out to 
high redshifts. One of the major science goals is to better 
characterise the evolution of dark energy with redshift. The 
SKA is expected to provide competitive constraints on the 
nature of dark energy through high accuracy measurement 
of large-scale structure in the galaxy distribution over a look- 
back time representing a significant fraction of the age of the 
Universe (Albrecht et al. 2006). This conclusion currently 
rests on very uncertain calculations which we seek to place 
on a firmer, more physical footing in this paper. 

Modelling the abundance and clustering of HI sources 
is challenging. A number of possible approaches have been 
tried; empirical modelling, which relies upon the observa- 
tions of HI in the Universe, the fully numerical approach, 
which uses cosmological gas dynamics simulations to model 
the HI content of galaxies from first principles and semi- 
analytical modelling, which we use in this paper. Empirical 
estimates have been attempted despite the paucity of ob- 
servational results for guidance (Abdalla & Rawlings 2005; 
Abdalla, Blake & Rawlings 2010). Such calculations require 
an assumption about the evolution of the HI mass function 
over a broad redshift interval. The only constraint on this 
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assumption is the integrated density of HI, which can be 
compared with the results inferred from quasar absorption 
features, which themselves require corrections for unseen 
low column density systems and dust extinction (Storrie- 
Lombardi et al. 1996). The empirical approach does not pre- 
dict the clustering of HI sources. Further assumptions and 
approximations are necessary to extend this class of mod- 
elling so that predictions cam be made for galaxy clustering. 
Another layer of approximation in this class of modelling 
lias been motivated by observations which suggest that HI 
sources tend to avoid the centres of clusters and that clus- 
ters do not boast an important population of satellites (e.g. 
Waugh et al. 2002; Vcrhoijon et al. 2007). This led Marin 
et al. (2009) to make a one-to-one connection between halo 
mass and HI mass. However, the nature of the relation is un- 
certain and several possibilities are explored by Marin et al. 
based on different assumptions about the evolution of the 
HI mass function. 

Ideally, a physically motivated model which follows the 
sources and sinks of cold gas is needed. Gas dynamic simu- 
lations are computationally expensive and are typically re- 
stricted to small computational volumes, which makes it 
impossible to accurately follow the growth of structure to 
the present day. An example is provided by Popping et al. 
(2009), who carry out a smoothed particle hydrodynamics 
simulation in a 32/i~'^Mpc box. The HI mass function in 
the simulation is in very poor agreement with the observa- 
tional estimate of Zwaan ct al. (2005), uridcrprodicting the 
abundance of galcixies of HI mass W^°Mq by a factor of 
30, which the authors put down to the small computational 
volume, and overpredicting low mass systems by a factor 
of two. Clustering predictions are limited to scales smaller 
than a few Mpc due to the small box size. Furthermore, it is 
important to be awaxe that gas dynamic simulations do not 
have the resolution to follow all of the processes in galaxy 
formation directly and in all cases resort to what are essen- 
tially semi-analytical rules to treat sub-resolution physics. 

Currently the most promising route to making phys- 
ical and robust predictions for the HI in the Universe is 
semi-analytical modelling of galaxy formation (see Baugh 
2006). This type of model includes a simplified but physi- 
cally motivated treatment of the processes which control the 
amount of cold gas in a galaxy: gas cooling, galaxy mergers, 
stax formation and reheating of gas by supernovae. These 
calculations are quick and can rapidly cover the haloes in 
a cosmological volume. Baugh ct al. (2004) presented pre- 
dictions for the mass function of cold gas galaxies in the 
GALFORM semi-analytical model of Cole et al. (2000). One 
issue which must be dealt with is that the models predict 
only the total mass of cold gas, which includes helium, and 
both atomic and molecular hydrogen. Baugh et al. assumed 
a fixed ratio of molecular to atomic hydrogen. Obreschkow & 
Rawlings (2009) developed an empirical model based on ob- 
servations and theoretical arguments by Blitz & Rosolowsky 
(2006) in which this ratio could vary from galaxy to galaxy. 
Obreschkow & Rawlings applied this ansatz to the semi- 
analytical model of de Lucia & Blaizot (2007). 

In the first paper in this series, we compared the pre- 
dictions of a range of semi-analytical models for the mass 
function of HI (Power et al. 2009). Despite the different im- 
plementations of the physical ingredients used in the models 
and the different emphasis placed on various observations 



when setting the model parameters, the predictions show 
generic features. Power et al. found that there is surpris- 
ingly little variation in the predicted HI mass function with 
redshift, and that the models make similar predictions for 
the rotation speed and size of HI systems. The models pre- 
dict the mass of cold gas and so a conversion is required 
to turn this into a HI mass. Currently the most uncertain 
step is the assumption about what fraction of hydrogen is 
in atomic form and what fraction is molecular. Power et al. 
presented predictions for two cases, one in which all model 
galaxies are assumed to have a fixed molecular to atomic 
hydrogen ratio {H2/JIT) and the other in which this ratio 
varies from galaxy to galaxy, depending upon the local con- 
ditions in the galactic disk (Blitz & Rosolowsky 2006). The 
assumption of a variable i72/HI ratio results in a dramatic 
reduction in the number of HI sources in the tail of the red- 
shift distribution. 

In this paper we look at the distribution of cold gas in 
galaxies as a function of halo mass. In particular we look 
at the halo occupation distribution (HOD) for HI galaxies, 
which gives the mean number of galaxies of a given HI mass 
as a function of dark matter halo mass, and the clustering 
of HI galaxies. Using this information, we assess the poten- 
tial of the SKA to measure the baryonic acoustic oscillation 
(BAO) signal. We briefiy review the GALFORM model in Sec- 
tion 2, explaining the differences between the four models 
that we consider. We then look at the halo occupation dis- 
tribution of cold gas galaxies in Section 3, in which we also 
present predictions for the clustering of cold gas galaxies 
at different redshifts and compare to measured clustering at 
the present day. In Section 4 we compare the performance of 
future redshift surveys in the optical and using HI emission 
for measuring the properties of the dark energy. We present 
a summary along with our conclusions in Section 5. 



2 GALAXY FORMATION MODELS AND 
BASIC PREDICTIONS 

Semi-analytical models of galaxy formation invoke simple, 
physically motivated recipes to follow the fate of the baryons 
in a universe in which structure in the dark matter grows 
hierarchically (White & Rees 1978; White & Frenk 1991; 
Kauffmann et al. 1993; Cole et al. 1994; for a review of this 
approach see Baugh 2006). The current generation of models 
include a wide range of phenomena, ranging from the heat- 
ing of the intergalactic medium, which affects the cooling of 
gas in low mass haloes, to the suppression of cooling flows in 
massive haloes due to heating by accretion of matter onto 
supermassive black holes (e.g. Bower et al. 2006; Croton 
et al. 2006; Cattaneo et al. 2007; Monaco et al. 2007; La- 
gos, Cora & Padilla 2008). In this paper, we use the Durham 
semi-analytical galaxy formation code GALFORM to make pre- 
dictions for the amount of cold gas in dark matter haloes of 
different masses. This code was introduced by Cole et al. 
(2000) and has been developed in a series of papers (Benson 
et al. 2003; Baugh et al. 2005; Bower et al. 2006; Font et al. 
2008). The code predicts a wide range of properties for the 
galaxy population in the context of a spatially flat cold dark 
matter cosmology with a cosmological constant. 

In this paper we consider four different models run us- 
ing GALFORM. Two of these are available from the Millen- 
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Table 1. The values of selected parameters which are different in the models. The columns are as follows: (1) The name of the model. (2) 
The equation used to calculate the star formation timescale, t*. (3) The value of Ei, or Tq used in the star formation timescale. (4) The 
AGN feedback parameter, Ocooli (Eq. 1) (5) The supernova feedback parameter, Vhot (Eq. 2). (6) The source of halo merger histories. 
(7) Comments giving model source or key differences from published models. 
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Figure 1. The predicted ratio of neutral hydrogen mass to B- 
band luminosity luminosity (upper panels) and the cold gas mass 
function (lower panels) in the Bow06 (left panels) and MHIBow06 
models (right panels). In the upper panels, the magenta points 
show observational estimates of the hydrogen mass to luminosity 
ratio using data from Huctmeier & Richter (1988) (HI) and Sage 
(1993) (H2). The black points show the median ratio predicted 
by the models and the grey shading shows the 20 - 80 percentile 
range of the predicted distribution. We assume that 76% by mass 
of the cold gas mass predicted by the models is neutral hydrogen. 
In the lower panels, the magenta points show the cold gas mass 
function derived from the HI mass function estimated by Zwaan 
et al (2005). Here, a constant H2/HI ratio of 0.4 has been assumed 
to convert the HI measurement into a cold gas mass. 



nium Archive^; these are the Bower et al. (2006; hereafter 
Bow06) and Font et al. (2008) models (hereafter FontOS). 
The third model is a modified version of the Bow06 model 
(which we label as MHIBow06), which is discussed in more 
detail below. In this model a small number of parameters 
have been adjusted from the values used in Bow06 in order 
to produce a better match to the cold gas mass function 
estimated by Zwaan et al. (2005). The fourth model (de- 
noted by GpcBow06) is set in a different background cos- 
mology from the other three, which adopt the cosmology of 
the Millennium simulation (Springel et al. 2005). The cos- 
mology of the GpcBow06 model is in better agreement with 
recent measurements of the cosmic microwave background 
and the large-scale structure of the Universe (Sanchez et al. 
2009).^ The Bow06, FontOS and MHIBow06 models use 
merger histories extracted from the Millennium Simulation. 
The GpcBow06 model uses Monte Carlo generated merger 
trees as described below. When we make predictions for the 
spatial distribution of galaxies in the GpcBowOB model, we 
use the GigaParsec simulation run at the Institute for Com- 
putational Cosmology (GPICC; Baugh et al. in prepara- 
tion), which uses 10 billion particles to model the hierar- 
chical clustering of mass in a simulation cube 1000/i~^ Mpc 
on a side. To keep the number of models manageable, we 
do not consider the Baugh et al. (2005) model in this paper. 
This model was included in the study by Power et al. (2009). 
The star formation recipe used in the MHIBow06 model is 
based on that used in Baugh et al. (2005). 

The MHIBowOG and GpcBow models use the same ba- 
sic physical ingredients as the Bow06 model. The FontOS 
model is based on Bow06, with a modification to the cool- 
ing prescription. We discuss these differences in more detail 
below. We first discuss some of the ingredients which are 
varied between the models, in order to introduce some of 
the parameter definitions used in GALFORM. 

All of the models we consider in this paper include the 
suppression of cooling flows in massive haloes, as a result 
of the energy released following accretion of matter onto a 



^ http:/ /galaxy-catalogue. dur.ac.uk:8080/Millennium/ 
^ The cosmological parameters used in the Millennium simula- 
tion are a matter density Qq = 0.25, a cosmological constant 
Ao = 0.75, a Hubble constant Hq = 73kms~^Mpc~^, a primor- 
dial scalar spectral index ris = 1, baryon density = 0.045 and 
fluctuation amplitude as = 0.9. In the Sanchez et al. (2009) best 
fitting model these parameters become Qq = 0.26, Aq = 0.74, 
Ho = 71.5kms~^Mpc'\ ris = 0.96, = 0.044, and as = 0.8. 
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Figure 2. The cold gas mass function predicted in the four models at ^=0 (left), z = I (middle) and 2=2 (right). Different colours and 
line types correspond to different models as indicated by the legend. The points show the local {z = 0) observational estimate of the 
cold gas mass function inferred from the HI mass function of Zwaan et al. (2005) (see text in Section 3.2 for details of the conversion). 
These data are reproduced without error bars in the 2=1 and 2=2 panels as a reference from which to illustrate the evolution of the 
mass function. 



central supermassive black hole (Bower et al. 2006; Malbon 
et al. 2007; Fanidakis et al. 2009). A halo is assumed to be 
in quasi-hydrostatic equilibrium if the time required for gas 
to cool at the cooling radius, tcooi('"cooi), exceeds a multiple 
of the free-fall time at this radius, tfi(rcooi): 



> 



-iff (rcooi). 



(1) 



Q^cool 

where Qcooi is an adjustable parameter, whose value controls 
the sharpness and position of the break in the optical lumi- 
nosity function. The cooling flow in the halo is then shut 
down completely if the luminosity released by accretion of 
matter onto the supermassive black hole (SMBH) exceeds 
the cooling luminosity. The energy released by accretion de- 
pends on the mass of the SMBH (see, for example, Fanidakis 
et al. 2009). 

The models also include the ejection of cooled gas into 
the hot halo due to heating by supernovae. The strength of 
supernovae feedback is defined by the factor /3: 



/3 = (14ot/KiiB 



(2) 



The rate at which gas is reheated is (3 times the star forma- 
tion rate. Here Vdisk is the circular velocity of the disk at 
its half mass radius, and Vhot and Ohot are parameters. A 
similar equation holds for supernova feedback in the galac- 
tic bulge. In GALFORM, the parameters Viiot and Qhot are set 
without reference to the number of supernovae. The pri- 
mary constraints on these parameters are the shape of the 
luminosity function, the slope of the disk rotation speed - 
luminosity relation and the scale size of disks (see Cole et al. 
2000). 

The Bow06 and FontOS models use a star formation 
timescale, r*, which is proportional to the galactic dynami- 
cal time, Tdyn, and is given by : 



where a* and e^. are adjustable parameters (a* = —1.5 
in both cases). The dynamical time is defined as Tdyn = 
'"disk/Vdisk- In contrast, the MHIBow06 and the GpcBowOS 
models adopt a star formation timescale which does not de- 
pend on the galactic dynamical time. Instead, in these cases, 
the timescale is given by : 



r* = rf(Vdisk/200kms" 



(4) 



'Vdyn (Vdisk /200 kms 



(3) 



where and a* are adjustable parameters (again, in both 
cases, Q* = —1.5); this parameterization was used in Baugh 
et al. (2005). 

The FontOS model includes an improved treatment of 
the ram-pressure stripping of hot-gas atmospheres of satel- 
lite galaxies, motivated by the hydrodynamic simulations of 
McCarthy et al (2008). Also in this model, the yield of met- 
als per solar mass of stars formed is increased by a factor of 
two over the default but rather uncertain value expected for 
a standard solar neighbourhood stellar initial mass function. 
These changes are motivated in part by the desire to improve 
the predictions of the Bow06 model for the colour magnitude 
relation of central and satellite galaxies in groups. The re- 
vision to the stellar yield reddens the colour of all galaxies 
in the FontOS model compared with Bow06. The change in 
the cooling model changes the relative abundance of galax- 
ies in the red and blue populations at low luminosities. In 
the FontOS model, there are more faint blue satellite galax- 
ies than in the Bow06 model. These galaxies are starved of 
freshly cooled gas in Bow06 and so had redder stellar pop- 
ulations. The predicted colours in the FontOS model are in 
much better agreement with the observed colour magnitude 
relation measured by Weinmann et al. (2006). 

The motivation for the MHIBow06 model is clear from 
Fig. 1. This plot shows the galactic neutral hydrogen mass 
to optical luminosity ratio and the cold gas mass function 
at the present day. Note that when we plot mass function 



(lower panels of Fig. 1, cold gas masses are plotted in units 
of h~'^ Mq rather than /i~^Mq, which is the unit used in 
the simulation. This ensures that the observational units 
(which depend upon the square of the luminosity distance) 
are matched. The Bow06 model predicts a gas mass to lumi- 
nosity ratio with the wrong zeropoint and slope. Since this 
model gives an excellent match to the local optical luminos- 
ity function, the discrepancy in the gas to luminosity ratio 
results in a poor match to the cold gas mass function. The 
MHIBow06 model uses the star formation timescale given 
by Eq. 4 and also adopts a different value for the AGN feed- 
back free parameter, Ofcooi (Eq- 1; see Table 1). The right 
hand panels of Fig. 1 show that the MHIBow06 model is in 
much better agreement with the observed gas to luminos- 
ity ratio and cold gas mass function for cold gas masses in 
excess of ~ 3 x lQ^h~^ Mq. Note that the models predict 
the mass of cold gas, which includes helium, atomic hydro- 
gen and molecular hydrogen. The observed mass function 
in Fig. 1 is measured in terms of the atomic hydrogen (HI) 
content of galaxies. To convert this into a cold gas mass, we 
have assumed a fixed ratio of molecular to atomic hydrogen 
and corrected for the mass fraction of Helium (see Power 
et al. 2009). We shall return to this point in Section 5. 



The GpcBow06 model starts from the Bow06 model, 
with small adjustments made to the galaxy formation pa- 
rameters to obtain a good match to the optical luminosity 
function (this is required because the cosmological model 
has changed from that used in Bow06) and also to repro- 
duce the observed HI mass function. The GpcBow06 model 
uses Monte-Carlo merger trees generated using the improved 
algorithm devised by Parkinson et al. (2008). 



Fig. 2 shows the cold gas mass function predicted by 
the four models at z=Q, 1 and 2. The Bow06 and FontOS 
models overpredict the abundance of galaxies with a given 
cold gas mass at 2 = compared with the observational 
estimate by Zwaan et al. (2005). On the other hand, the 
cold gas mass functions of the MHIBow06 and GpcBow06 
models agree well with the local observational estimate for 
masses in excess of l{fi'^h~^ Mq. The discrepancy between 
the predictions and observations at lower masses is not due 
to the finite resolution of the N-body halo merger trees. The 
turnover can be traced back to the modelling of the pho- 
toionisation of the intergalactic medium and the impact this 
has on the cooling of gas in low effective circular velocity 
haloes. In all cases a particularly simple approach is taken 
to model this effect, whereby cooling in low circular velocity 
haloes (vc < Ucut) is suppressed below the redshift at which 
the universe is assumed to have been reionised (^cut). The 
parameters adopted («cut ~ 50kms~^ and Zcut = 6) may 
overestimate the impact of this effect according to recent 
simulations by Okamoto, Gao & Theuns (2008). The form 
of the observed HI mass function at low masses could give 
interesting contraints on the modelling of photoionisation 
and supernova feedback (Kim et al, in preparation). Here 
we focus on the more massive galaxies which dominate the 
overall HI content of the Universe. 
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Figure 3. The cold gas mass of galaxies in the Bow06 model as 
a function of the mass of their host dark matter halo. The black 
points show individual galaxies. The symbols joined by lines show 
the median cold gas mass as a function of halo mass, for central 
galaxies (blue), satellite galaxies (red) and all galaxies (black). 
The bars show the 10-90 percentile range of the distribution of 
cold gas masses. All galaxies, including those with zero cold gas 
mass are included when computing the median and percentile 
range. The solid black line shows the cold gas mass a galaxy 
would have if all the available baryons in its halo were in the 
form of cold gas in one object. 

3 THE SPATIAL DISTRIBUTION OF COLD 
GAS 

We now compare the predictions of the four galaxy forma- 
tion models for the spatial distribution of cold gas with one 
another and with observations. To understand the spatial 
distribution of cold gas, we first look at the halo occupation 
distribution (HOD; Benson et al. 2000; Peacock & Smith 
2000; Berlind & Weinberg 2002). This quantifies the num- 
ber of galaxies above a given cold gas mass, as a function 
of dark matter halo mass (§ 3.1). We present predictions for 
the correlation function of galaxies selected by their cold gas 
mass in § 3.2. 

3.1 The halo occupation distribution 

3.1.1 Variation of Cold Gas Mass with Halo Mass 

Before considering the halo occupation distribution directly, 
it is instructive to first look at how the cold gas mass of 
galaxies varies with the mass of their host dark matter halo, 
which we plot in Fig. 3 for the Bow06 model. The median 
cold gas mass as a function of host halo mass is plotted 
separately for central and satellite galaxies. There is a tight 
correlation between the mass of cold gas of a central galaxy 
and its host halo mass for galaxies in haloes less massive 
than ~ 3 X 10^^/i~^Mq. In haloes more massive than this, 
AGN feedback suppresses gas cooling and there is a dramatic 
break in the galaxy cold gas mass - halo mass relation, with 
an accompanying increase in the scatter. The galaxies with 
the largest mass of cold gas do not lie in the most mas- 
sive dark matter haloes, but reside instead in haloes with 
masses ~ 10^^/i"^MQ. The predicted cold gas mass - halo 
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mass relation is remarkably similar to that inferred obser- 
vationally (Wyithe et al. 2009a). Another conclusion that is 
readily apparent from Fig. 3 is that the bulk of the baryons 
associated with a dark matter halo are not in the form of 
cold gas. The solid line in this plot shows the mass a galaxy 
would have if all of the available baryons in the halo were in 
the form of cold gas in one object, assuming the universal 
baryon fraction. The points are some way below this line for 
two reasons: 1) in most haloes, the bulk of the baryons are 
in the hot phase and 2) there is more than one galaxy in 
most haloes. 

3.1.2 Cold Gas Halo Occupation Distributions 

We now examine the predictions for the halo occupation dis- 
tribution (HOD) of galaxy samples constructed according to 
cold gas mass. The HOD gives the mean number of galaxies 
which satisfy a given selection criterion as a function of halo 
mciss, and can be broken down into the contribution from 
the central galaxy in a halo and its satellite galaxies. In the 
case of optically selected galaxy samples, the HOD is com- 
monly described by a step function for central galaxies and 
a power law for satellite galaxies (Peacock & Smith 2000; 
Seljak 2000; Berlind & Weinberg 2002; Zheng 2004). Many 
attempts have been made to interpret the clustering of opti- 
cally selected galaxy samples using the HOD formalism (van 
den Bosch et al. 2003; Magliocchctti & Porciani 2003; Ze- 
havi ct al. 2005; Yang ct al. 200.5: Tinker et al. 2007; Wake 
et al. 2008; Kim et al. 2009). In contrast, there axe few stud- 
ies of the clustering of galaxies selected on the basis of their 
atomic hydrogen mass using the HOD formalism (Wyithe 
et al 2009a, 2009b; Marin et al. 2009). 

Fig. 4 shows the typical form predicted by the mod- 
els for the HOD of galaxies selected by their cold gas mass. 
The left panel shows the HOD for galaxies in the Bow06 
model which have cold gas masses in excess of 3 x lO^/i^^Mg , 
chosen to have the same HI mass cut as HIPASS. For this 
mass threshold, the abundance of central galaxies is sharply 
peaked around a halo mass of ~ 2 x lO^^/i'^M©. The HOD 
of satellite galaxies reaches unity in haloes which are a hun- 
dred times more massive. In these haloes, the central galaxy 
has a cold gas mass below the cut-off; there is essentially zero 
chance of finding a halo which contains a central galaxy and 
a satellite galaxy above this cold gas mass threshold. How- 
ever, this does not imply that it is impossible to find more 
than one galaxy per halo with cold gas masses above the 
threshold, simply that when this occurs (i.e. once a suffi- 
ciently massive halo is considered), both galaxies will be 
satellites. 

For comparison, we also plot in the left hand panel of 
Fig. 4 the traditional form adopted for the HOD of cen- 
tral galaxies (i.e. a step function). The minimum halo mass 
in this case is set by the requirement that the step func- 
tion reproduces the number of central galaxies in the Bow06 
model. The step function HOD is markedly different to the 
predicted HOD, which is closer to a Gaussian. A similar con- 
clusion about the peaked form of the central galaixy HOD 
was postulated by Zehavi et al. (2005) for blue central galax- 
ies. Wyithe et al. (2009a) model the clustering of galaxies 
in the HIPASS survey by adopting a step function for the 
central galaxy HOD and a truncated power law for satel- 
lite galaxies, such that haloes above some mass cut con- 



tain no satellites. The truncation point lies in the halo mass 
range W^'^-W^^' h^^ Mq, depending on the slope of the satel- 
lite HOD. As we shall see later on, whilst this truncation is 
not predicted by any of the models, this has little impact on 
the abundance or clustering of the galaxies. 

In Fig. 4, the HOD of central galaxies in the Bow06 
model drops far below unity above a halo mass of 
~10^^/i~^Mq. In this model there is very little cold gas in 
haloes more massive than this due to the shut down of the 
cooling flow by AGN heating. To illustrate this, in the mid- 
dle panel of Fig. 4 we vary the halo mass which marks the 
onset of AGN heating by changing the value of the acooi 
parameter (see Eq. 1). Reducing the value of Ocooi results in 
the halo mass in which cooling stops being shifted to higher 
masses. In the absence of AGN heating (i.e. acooi=0), the 
central galaxy HOD still drops below the unity in the most 
massive haloes {Mhaio>W^^ h~^MQ) due to the longer cool- 
ing time of the gas in these haloes. These haloes typically 
have a lower formation redshift and thus a lower gas den- 
sity and are also hotter; hence they have a longer cooling 
time. Cold gas is depleted by star formation in such massive 
haloes. 

We shall see later that the peaked HOD for central 
galaxies is common to all of the GALFORM models consid- 
ered, particularly at low redshift. We now examine whether 
or not this feature is peculiar to the way AGN feedback is 
implemented in GALFORM by comparing the Bow06 predic- 
tions with those of Dc Lucia & Blaizot (2007; hereafter the 
DeLucia07 model). The right hand panel of Fig. 4 shows 
that the central galaxy HOD in the DeLucia07 model is 
somewhat broader than that predicted in Bow06, and even 
increases beyond a halo mass of - 2 X 10^-Vi^^Mq. How- 
ever, as we shall demonstrate further on in this section, this 
upturn has little impact on the predicted clustering. The 
suppression of gas cooling in the DeLucia07 semi-analytical 
model is smoother than in GALFORM (see Croton et al. 2006 
for a description of the implementation of radio mode feed- 
back). Some gas is permitted to cool in haloes with hot gas 
atmospheres in the DeLucia07 model, with the cooling rate 
modified by accretion onto the central SMBH. In GALFORM, 
the cooling flow and heating rate are assumed to balance ex- 
actly whenever there is a quasi-hydrostatic hot halo and the 
Eddington luminosity of the black hole exceeds the cooling 
luminosity. 

Figs. 5, 6 and 7 show the HOD in the four Durham 
models at z = 0, 1 and 2. Each column shows the HOD pre- 
dicted for a different cold gas mass threshold, with the mass 
cut increasing to the right. The rows show the difi^erent mod- 
els introduced in Sec. 2. For the most massive cold gas mass 
threshold plotted in Fig. 5, the mean occupation mimber in 
the MHIBow06 and GpcBow06 models is less than 1 galaxy 
per 100 haloes. In the Bow06 model, the HOD peaks at a 
halo mass just under IO^^/i^^Mq, with around 1 in 10 such 
haloes hosting a central galaxy with cold gas mass above the 
threshold. 

The size of the departure from the traditionally as- 
sumed step function HOD for central galaxies at 2 = in 
Fig. 5 varies in proportion to the "strength" of AGN feed- 
back for the Bow06, FontOS and MHIBow06 models (see 
Table 1). Although the GpcBow06 model has the weakest 
AGN feedback, the deviation from a step function is largest 
in this case since this model adopts weaker supernovae feed- 
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Figure 4. The predicted iialo occupation distribution (HOD) of galaxies with cold gas mass in excess of lO^'^/i~^M0, chosen to match 
the sample of galaxies for which Wyithe et al. (2009) estimated the HOD for in HIPASS. The left panel shows the HOD predicted in the 
Bow06 model (solid lines: blue shows the central galaxy HOD, red shows satellites and black shows the overall HOD). The dashed blue 
line shows a step function designed to reproduce the number of central galaxies in Bow06. The dashed black line shows this step function 
combined with the model HOD for satellites. The central panel shows the impact on the HOD of changing the halo mass above which 
AGN feedback stops the cooling flow. The fiducial Bow06 model corresponds to QcooI = 0.58. In a model with a larger value of Ocooli 
the onset of cooling suppression can shift to lower mass haloes; reducing Ocool means that cooling is only switched off in more massive 
haloes. The right hand panel compares the HOD predicted by Bow06 (solid lines) with that in the model of DeLucia & Blaizot (2007), 
for the same cold gas mass threshold (dashed lines). The colour coding is the same in each panel. 



back than the other models (as a result of being set in a 
diflterent cosmology, with a lower density fluctuation am- 
plitude). The departure of the central galaxy from a step 
function form is less pronounced at z = 1 (Fig. 6). This is 
because fewer haloes have hot gas haloes and those which do 
host lower mass SMBH (see Fanidakis et al. 2009 for plots 
showing how the mass of SMBH is built up over time in the 
models). These trends continue in Fig. 7, which shows the 
HOD for the GALFORM models at 2 = 2. The HOD of cen- 
tral galaxies is now better approximated by a step function. 
The HODs become noisy for massive haloes as such objects 
are extremely rare at this redshift. The central galaxy HOD 
in the FontOS model has a Gaussian form centered on halo 
masses of a few times 10^^ Mq. The HOD displays an 
upturn for more massive haloes which is reminiscent of the 
HOD in the DeLucia07 model. In FontOS, this mass could 
be brought in by merging satellites, which will have a higher 
cold gas mass than in the other Durham models. The central 
galaxy HOD becomes closer to the canonical step function 
form with increasing redshift. 

Fig. 5 shows that the amplitude of the HOD for satel- 
lite galaxies in the Bow06 model is higher than in the MHI- 



Bow06 and GpcBow06 models. This is due in part to the 
Bow06 model predicting a higher abundance of galaxies by 
cold gas mass than is observed (see Fig. 2). The FontOS 
model predicts many more satellite galaxies than the other 
models (~10 times more for the two lowest cold gas mass 
thresholds). This can be traced back to the modified cool- 
ing model in FontOS, which means that satellites accrete gas 
that cools from their incompletely stripped hot haloes. Also 
some of the gas which is reheated by supernovae in the satel- 
lite is allowed to recool onto the satellite rather than being 
incorporated into the main hot halo. The amplitude of the 
HOD for satellite galaxies at z — 1 (Fig. 6 in the Bow06, 
MHIBow06, and GpcBow06 models is higher than predicted 
at z = 0. Star formation depletes the cold gas by 2 = 0. The 
power law slope of the satellite HOD is remarkably constant 
regardless of cold gas mass threshold, redshift or galaxy for- 
mation model, with A'aat oc M^gf^. The predicted slope is in 
good agreement with the best fitting value determined from 
clustering in the HIPASS sample, with Wyithe et al. (2009a) 
reporting a slope of 0.7 ± 0.4. 
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Figure 5. The halo occupation distribution, i.e. the mean number of galaxies passing the selection labelled per halo, at ^ = for galaxy 
samples defined by cold gas mass thresholds. The blue dashed curves show the contribution from central galaxies, the red dotted curves 
show satellite galaxies and the black solid curves show the overall HOD. Each row corresponds to a different model, and each column to 
a different cold gas mass threshold, as labelled. 



3.1.3 Comparing HODs for optical and cold gas mass 
selection 

We next compare the model predictions for the HOD of 
an optically selected galaxy sample with those of cold gas 
meiss selected samples. Fig. 8 shows the HOD for samples 
defined by a cold gas mass threshold of lO^°/i"^M0 in the 
first four columns, with each column showing the predic- 
tions for a different model. In the right hand column, we 
plot the HOD for a sample in which galaxies are selected 
on the basis of their r-band luminosity in the GpcBow06 
model. The optical luminosity cut is chosen such that the 
galaxies brighter than the limit {Mr — 5\ogh < —21.06) 
have the same number density as the sample selected by 
cold gas mass in the GpcBow06 model. As we have already 
remarked, the HODs for the cold gas samples have similar 
properties, with a peaked HOD for central galaxies which 



declines rapidly with increasing halo mass, and a power law 
HOD for satellites. The HOD for central galaxies in the opti- 
cal sample shows a local bump for haloes masses just below 
10^^ /i"^M0, but overaU rises gradually, reaching unity at a 
halo mass of ~ 3 x 10^"^ Mq. The bump is due to the 
implementation of AGN feedback. The central galaxy HOD 
drops after the bump as AGN feedback "switches on" in 
these haloes. Central galaxies hosted by massive haloes are 
bright in the r-band, whilst possessing too little gas to be 
included in the cold gas sample. 

The remaining rows of Fig. 8 show the steps which con- 
nect the HOD predictions to the effective bias of the galaxy 
samples, which tells us the clustering amplitude. In the lower 
two rows of this plot we have switched to plotting quantities 
on a linear scale. In the second row of Fig. 8, the HOD is 
multiplied by the abundance of the host dark matter haloes, 
giving the contribution to the number density of galaxies as a 
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Figure 6. The halo occupation distribution at z = I for samples defined by a threshold cold gas mass. The blue dashed curves show the 
contribution from central galaxies, the red dotted curves show satellite galaxies and the black solid curves show all galaxies. Each row 
shows a different model as labelled, using the notation set up in Section 2. 



function of halo mass. The abundance of the host dark mat- 
ter haloes is computed using the prescription of Sheth, Mo 
& Tormen (2001), which gives a good match to simulation 
results. Beyond the break in the mass function, the number 
of haloes per unit volume drops exponentially. This means 
that satellite galaxies, whose HOD is described by a moder- 
ate power law, do not contribute significantly to the number 
of galaxies per unit volume. This is true for samples defined 
either by cold gas mass or r-band luminosity. The abundance 
of galaxies is sharply peaked for the cold gas samples. For 
the optical sample, the galaxy number density has a sharp 
peak just below a halo mass of 10^^ Mq and then shows 
a broad distribution and an appreciable contribution from 
more massive haloes. In the bottom row of Fig. 8, we plot the 
number density of galaxies multiplied by the bias factor as 
a function of halo mass (as computed using the prescription 
of Sheth, Mo & Tormen 2001). The square of the bias gives 
the factor by which the auto-correlation function of haloes 



is boosted on large scales relative to the correlation function 
of the dark matter. The halo bias increases rapidly beyond 
the break in the mass function, which increases the infiu- 
ence of satellite galaxies on the effective bias (e.g. Angulo 
et al. 2008b). Nevertheless, for the cold gas mass samples 
satellite galaxies still make a negligible contribution to the 
clustering amplitude on large scales, as quantified by the ef- 
fective bias. Satellite galaxies make a modest contribution 
to the effective bias in the r-band sample. This contribution 
increases if the luminosity cut is made fainter. In summary, 
the models predict that galaxies with cold gas mass in excess 
of IO^^/i^^Mq are predominately central galaxies hosted by 
dark matter haloes of mass lO^^/i"^M0. These haloes are 
less massive than the characteristic halo mass at 2; = in 
the cosmologies used and so the bias factor of these sam- 
ples is below unity; they are sub-clustered compared to the 
dark matter. In contrast, the r-band sample has an effec- 
tive bias with a significant contribution from more massive 
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Figure 7. The halo occupation distribution at 2 = 2. As before, the blue dashed curves show the contribution from central galaxies, the 
red dotted curves show satellite galaxies and the black solid curves show all galaxies. Each row shows a different model as described in 
Section 2. Each column corresponds to a different cold gas mass threshold as labelled. 



haloes which have a larger bias factor. The bias factor for 
the r-band selected samples is therefore greater than unity 
and clustering length is larger than it is for cold gas sample 
(see Fig. 10 later). 

Finally, in Fig. 9 we compare the spatial distribution 
of r-band selected galaxies with that of galaxies chosen on 
the basis of their cold gas mass (Mcom > lO^''/i~^M0 in 
the GpcBow06 model). Again the r-band magnitude limit 
(Mr — 51og/i < —21.06) is chosen to match the abundance 
of galaxies in the cold gas sample. The grey circles represent 
dark matter haloes. The circle radius and darkness are pro- 
portional to halo mass. The cold gas selected galaxies follow 
the filamentary structure and tend to avoid high density re- 
gions. The difference in the number of satellite galaxies (red 
circles) is obvious between the cold gas and optical samples. 
The satellites are found in more massive haloes. This differ- 
ence in the spatial distributions provides a visual impression 
of the differences in the HODs plotted in Fig. 8. The stronger 



clustering of the optical samples in principle means that it 
should be easier to measure the power spectrum of galaxy 
clustering using these tracers. However, the key considera- 
tion, as we shall see in Section 4, is how the product of the 
number density of galaxies and their power spectrum ampli- 
tude changes with redshift. This quantity controls the "con- 
trast" of the power spectrum signal against the noise which 
arises from having discrete tracers of the density field. 



3.2 Predictions for the clustering of cold gas 

In this section we present the predictions of the galaxy for- 
mation models introduced in Section 2 for the two point 
correlation function. To predict the galaxy distribution of 
the GpcBow06 model, we generated galaxy samples using 
the GPICC simulation. 

We start in Fig. 10 by comparing the spatial two point 
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Figure 8. The steps relating the number of galaxies per halo to the strength of galaxy clustering in the GALFORM models. The first row 
shows the HOD as a function of halo mass. The second row shows the HOD multiplied by the abundance of dark matter haloes as a 
function of halo mass, dn/d\nMf^alo^ computed using the prescription of Sheth, Mo & Tormen 2001) with the j/-axis plotted on a 
linear scale, ntot is the number density of galaxies which satisfy the selection criteria (i.e. in cold gas mass or r-band luminosity). The 
integral of these curves is proportional to the number density of galaxies. The bottom row shows the HOD times the halo mass function 
times the bias factor as a function of halo mass . The area under the curves in this case gives the effective bias of the galaxy sample. The 
first four columns show the model predictions for galaxies with cold gas mass in excess of Mcoid > W°h-'^MQ. The fifth column shows 
an r-band selected sample in the GpcBow06 model, with the magnitude limit (Af^ — 51og/i < —21.06) chosen such that the number of 
galaxies matches that in the cold gas sample in this model. As before, the contribution of central galaxies is shown by blue dashed lines, 
satellite galaxies by red dotted lines and all galaxies by black solid lines. 



autocorrelation function of a galaxy sample defined by a 
threshold cold eas mass (Mcoid > IQ^^H-'^Mq) in real (solid 
black line) and redshift space (dashed black line). The cor- 
relation function is computed in redshift space using the 
distant observer approximation. In this approximation, one 
of the coordinate axes is chosen as the line of sight and the 
peculiar velocity of the galaxy in that direction is added 
to the real space position, after applying a suitable scal- 
ing to convert from velocity units to distance units. For the 
largest pair separations plotted, the correlation function in 
redshift space has the same shape as the real space corre- 



lation function, but a larger amplitude. The magnitude of 
the shift in amplitude agrees very closely with the expec- 
tation of Kaiser (1987). This effect is caused by coherent 
bulk flows towards overdense regions. On pair separations 
between 0.3 and l/i~^Mpc, the real and reshift space cor- 
relation functions are very similar. They diverge on smaller 
scales, where the predictions are noisy simply because there 
are few galaxies pairs at these separations. 

This behaviour can be contrasted with the clustering 
in the optically selected sample, which is shown by the red 
lines in Fig. 10. As with the cold gas sample, there is a shift 
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Figure 9. The spatial distribution of galaxies and dark matter haloes in the GpcBow06 model at ^ = 0. Dark matter is shown in 
grey and the size and darkness of the circle used to plot the dark matter halo increase with mass. Galaxies selected by cold gas mass 
(Meoid > W^^h~^ Mq) and r-band luminosity {Mr — 51og/i < —21.06) are plotted in the left and right hand panels respectively. The 
top row shows a slice of 100 h~^Mpc on a side and 10/i~^Mpc thick. The bottom row shows a zoom into a region of 30/i~^Mpc on a 
side and 10/i~^Mpc thick, which corresponds to the blue square in the top row. The green circles represent central galaxies and the red 
circles show satellite galaxies. 



in the clustering amplitude when measured in redshift space 
for pair separations r > 3/i~^Mpc. However, the size of the 
shift is smaller for the optically selected sample, which is 
consistent with the bias of this sample being greater than 
unity and larger than the bias of the cold gas selected sam- 
ple. The real-space correlation function of the optical sample 
is steep on small scales, reflecting the contribution of satel- 
lite galaxies within common dark matter haloes. There is a 
substantial reduction in the clustering amplitude in redshift 



space on these scales in the optical sample, again driven 
by satellite galaxies. This is the so-called "fingers of God" 
redshift space distortion, whereby randomised peculiar ve- 
locities of the satellites within the gravitational potential of 
the cluster make the cluster appear elongated. 

The real space correlation function cannot be estimated 
directly from a galaxy redshift survey. A related quantity is 
the projected correlation function which can be estimated 
from the two point correlation function measured in bins of 
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Figure 11. The projected correlation function for cold gas mass selected samples at ^ = (top), 1 (middle) and 2 (bottom). Each column 
shows the predictions for a different cold gas mass threshold, as indicated by the label. The predictions of the models are distinguished 
by different line types and colours, as shown by the key in the upper left panel. The solid black lines in each panel show the projected 
correlation function of the dark matter measured in the Millennium simulation (note that the GpcBow06 model uses a different cosmology 
and has different dark matter correlation functions). 



pair separation parallel (tt) and perpendicular (a) to the line 
of sight, ^{a,Tv) (e.g. Norberg et al. 2001): 

— = - / C(a,^)d^. (5) 
Jo 

In the limit that the integral over the radial pair separation 
can be taken to infinity, this quantity is free from redshift 
space distortions (see Norberg et al. 2009 for an illustra- 
tion of the impact of imposing a finite upper limit on the 
integral) . 

Fig. 11 shows the projected correlation function pre- 
dicted in the four models for a range of cold gas mass samples 
at z = 0, 1 and 2. The columns show the results for difi'erent 



cold gas mass thresholds, and the rows correspond to dif- 
ferent models. The solid black lines in each panel show the 
projected correlation function measured for the dark mat- 
ter in the Millennium Simulation (recall that the GpcBow06 
model has a different cosmology and so should be compared 
to a consistent dark matter correlation function which will 
be slightly different from that in the Millennium simulation 
on these scales). Overall, the three lowest mass samples at 
z — are less clustered than the dark matter. The most 
massive threshold sample we consider at this redshift has a 
similar clustering amplitude to the dark matter. At z = 1, 
the bias of the three lowest mass samples is close to unity, 
with the projected clustering of galaxies being very close to 
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Figure 10. The real space (solid) and redshift space (dashed) cor- 
relation function predicted for galaxies in the GpcBow06 model 
at 2 = 0. The black lines show the correlation function of galaxies 
with cold gas mass M^oid > 10^^ Mq and the red lines show 
the clustering of galaxies selected to be brighter than a threshold 
r-band luminosity, with the limit chosen to match the abundance 
of galaxies in the cold gas sample. The errorbars show the Poisson 
error on the pair count in each bin of radial separation. 
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Figure 12. The projected galaxy correlation function at 2 = 0. 
The points with errorbars show an observational estimate made 
from the HIPASS catalogue by Meyer et al. (2007). The lines show 
the model predictions for galaxies more massive than Mcoid > 
10^'^/i~^Mq, a threshold chosen to match the selection of galax- 
ies in the HIPASS sample. The results for different models are 
shown by lines with different colours and line types, as indicated 
by the key. 



that of the dark matter. At z = 2, the cold gas samples 
are more clustered than the dark matter and correspond- 
ingly have effective biases greater than unity. This evolution 
in the bias is due to the adoption of a fixed cold gas mass 
threshold. At high redshift, galaxies with a large cold gas 
mass will tend to be found in more massive haloes. 

Across the different models there is a small spread in 
clustering amplitude for a given cold gas mass sample, with 
remarkably similar predictions made for the projected cor- 
relation function. Fig. 11 shows that the differences start 
to appear at z = 1 and become larger by z; = 0. The model 
which shows the largest difference from the others is FontOS. 
On small scales in the two lowest mass threshold samples, 
this model has an appreciably higher amplitude projected 
correlation function than the other models. This feature can 
be traced back to the HODs plotted in Fig. 5. Due to the 
revised cooling model used in FontOS, there are more satel- 
lite galaxies in the low mass samples in this model, which 
boosts the one halo term in the correlation function. 

Finally, we compare the predicted correlation func- 
tions with an observational estimate from Meyer et al. 
(2007), which was made using the HI Parkes All-Sky Survey 
(HIPASS) Catalogue (HICAT; Meyer et al. 2004). In order 
to make this comparison, we need to convert the cold gas 
mass output by the models into an atomic hydrogen mass. 
We assume that 76% by mass of the cold gas is hydrogen. 
Here we adopt a fixed ratio of molecular (H2) to atomic (HI) 
hydrogen of H2/HI=0.4 (see Power et al. 2009 for a discus- 
sion). The HI mass. Mm, is therefore obtained from the cold 
gas mass Mcow by applying the conversion: 



Mhi = 0.76Mcoid/(l + 0.4). (6) 

With this relation, the sample analyzed by Meyer et al. 
is equivalent to a cold gas mass threshold of M^oid > 
Wp'^h"^ Mq. The comparison between the model predic- 
tions and the observational estimate is presented in Fig. 12. 
The correlation function predicted by the MHIBow06 model 
agrees remarkably well with the observational estimate. The 
GpcBow06 and Bow06 models predict too low a clustering 
amplitude. The FontOS model gives a reasonable match on 
intermediate and large scales, but somewhat overpredicts 
the clustering amplitude on small scales, hinting that this 
model has too many gas rich satellites in massive haloes. 



4 MEASURING DARK ENERGY WITH 
FUTURE HI REDSHIFT SURVEYS 

In this section we show how redshift surveys of HI selected 
galaxies can be used to detect baryonic acoustic oscillations 
(BAO) in the galaxy power spectrum, and we assess the 
relative performance of HI and optical surveys in measuring 
the large scale structure of the Universe. 

The BAO signal measured in a sample defined by cold 
gas mass is shown in Fig. 13. We use the galaxy distribution 
in the GPICC simulation generated using the GpcBow06 
model. To show the BAO more clearly, we have divided the 
measured spectrum by a reference power spectrum which 
contains no wiggles. For the linear theory prediction, which 
is shown by the curves in Fig. 13, the reference is based 
on the "no wiggle" parametrization of the power spectrum 
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Figure 13. The baryonic acoustic oscillations in the galaxy power 
spectrum. To display the BAO more clearly, we have divided the 
predicted spectra by smooth fits, as described in the text. The 
points show the power spectra predicted by the GpcBow06 model 
at z = 2 (bottom), z = 1 (middle) and 2 = (top). The left 
hand column shows the power spectra measured for galaxies with 
cold gas mass (Mcoid > l0^^h~^TsAQ). The right hand columns 
shows the BAO in a sample selected in the r-band with the same 
number density of galaxies as the cold gas sample at that redshift. 
The smooth green line shows the linear theory power spectrum, 
divided by a smooth reference power spectrum, after filtering or 
"de- wiggling" to damp the higher harmonics (see text). The errors 
plotted on the power spectrum depend on the number density of 
galaxies and the simulation volume (see eq. 3 in Angulo et al. 
2008a). 



given by Eisenstein & Hu (1998). The no wiggle prediction 
includes the impact of a non-zero baryon component on 
the vyidth of the turn-over in the matter power spectrum. 
The ratio of the linear theory power spectrum, P^{k), to 
the no wiggle prediction, P^w, is "de- wiggled" by damping 
the oscillations to represent the impact of nonlinear growth 
and redshift-space distortions (e.g. Eisenstein et al. 2005; 
Sanchez, Baugh & Angulo 2008): 



Rlin{k) 



1 X exp 



+ 1, 



(7) 



where fcni is the damping scale and is treated as a free pa- 
rameter. 

The overall shape of the power spectra measured from 
the simulation is different from the linear theory prediction 
due to the nonlinear growth of fluctuations and redshift- 
space distortions (see Angulo et al. 2008a for a step by step 
illustration of these effects). We model this change in shape 
by multiplying the no-wiggle version of the linear theory 
spectrum by a third order polynomial: 

Ps{k) = {l + Ak + Bk^ + Ck^)PLik). (8) 
The free parameters A, B and C are chosen to give the best 
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Figure 14. The quantities needed to compute the effective vol- 
ume of a redshift survey, as predicted in the GpcBow06 model. 
The upper panel shows the number density of HI galaxies for dif- 
ferent SKA configurations, with the collecting area and survey 
duration given in the legend. The red and blue histograms show 
predictions for the full SKA collecting area. The field of view is 
assumed to be 200 square degrees in all cases, with the full survey 
covering one hemisphere. Heavy lines show the model predictions 
for a fixed H2/HI conversion, with thin lines of the same colour 
and style showing the predictions for a variable H2/HI ratio. The 
middle panel shows the effective bias as a function of redshift for 
the corresponding cases. The lower panel shows the product of the 
galaxy number density and galaxy power spectrum. The value of 
Pgai{A:) at fc = 0.2/1 Mpc-l is plotted. The green curves show the 
predictions for a spectroscopic survey down to i^ = 22 (assum- 
ing a 33% redshift success rate; green dot-long-dashed line) and 
a slitless survey of H-« down to a flux limit of 5 X lO^'^^erg s~^ 
cm~^, again with a 33% redshift measurement rate (green dot- 
short-dashed line); both these results are taken from Orsi et al. 
(2009). 



match to the overall shape of the measured power spectrum. 
All points up to fc = 0.4/iMpc~^ were included in the fit and 
given equal weight. This approach is more straightforward 
and robust than using a spline fit to a coarsely binned mea- 
sured spectrum, which is sensitive to the number of fc-bins 
used. 

We show in Fig. 13 the BAO signal in the GpcBow06 
model at z = 0, 1 and 2 for galaxies selected by their cold gas 
mass (Mcoid > 10'^°/i~^Mq; left column), as an illustration 
of how a cold gas mass selected sample traces this large-scale 
structure feature. In the right-hand columm of Fig. 13, we 
compare this with the BAO signal expected for an r-band 
selected sample of galaxies that have same number density 
at each redshift as the cold gas sample. The reference power 
spectrum is defined as described above, using the third or- 
der polynomial fit to the measured spectrum in each case. 
Fig. 13 shows that we should be able to measure the BAO 
feature just as well using a sample selected by cold gas mass 
as with an optically selected sample. 
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Figure 15. The effective volume per steradian of HI selected sam- 
ples predicted in the GpcBow06 model. The upper panel shows 
the differential effective volume divided by the geometrical volume 
for narrow bins in redshift. The lower panel shows the cumulative 
volume. The results for different SKA configurations are shown 
by different line styles and colours as indicated by the key. The 
green curves show the predictions for a spectroscopic survey down 
to = 22 (33% redshift success rate; green dot-long-dashed line) 
and a slitless survey down to an H-o flux limit of 5 X 10~^^ erg 
s~^ cm~^ (green dot-short-dashed line), as computed by Orsi 
et al. (2009). The black solid line in the lower panel shows the 
available geometrical volume per steradian. The optical and HI 
surveys are assumed to cover approximately the same solid angle, 
one hemisphere. 



Many ongoing and proposed redshift surveys have the 
goal of determining the nature of dark energy by measuring 
the BAO signal in the galaxy power spectrum. A power- 
ful way to compare the expected performance of different 
surveys for measuring large-scale structure is to estimate 
their effective volumes (see, for example, Orsi et al. 2009). 
This is essentially an indicator of the "useful" survey volume 
which determines the size of the errorbar on the measured 
power spectrum. The effective volume is defined as (Feld- 
man, Kaiser & Peacock 1994) 



Kff(fc, z) 



1 + n{z)Pg{k,z) 



dV 
dz 



dz 



(9) 



where all quantities are expressed in comoving coordinates 
and dV/dz is the differential comoving volume. To calculate 
the effective volume, we therefore need to know the number 
density of galaxies (n{z)) down to a given survey flux limit 
and the effective bias {b{z)), both as functions of redshift. In 
this calculation, we obtain the galaxy power spectrum using 
the linear relation between galaxy bias and the dark matter 
power spectrum: Pg{k,z) = Pdniik,z = 0)b^ {z)D^ (z), where 
Pg is the galaxy power spectrum, Pdm(fc, 2 = 0) is the linear 
theory dark matter power spectrum at 2 = 0, b{z) is the 



effective bias, and D{z) is the growth factor of the dark 
matter. 

To make predictions for the effective volume of the SKA, 
we need to convert the cold gas mass predicted by the models 
into an HI line flux, which we do following the prescription 
set out in Power et al. (2009). A key step is the assumption 
about the fraction of neutral hydrogen which is in molecular 
form as opposed to atomic hydrogen. Power et al. (2009) 
adopted two prescriptions: a fixed fraction of 40% as used 
by Baugh et al. (2005) and a variable fraction as used by 
Obreschkow & Rawlings (2009), based on work by Blitz & 
Rosolowsky (2006), in which this ratio can vary from galaxy 
to galaxy. We shall refer to these two scenarios as the fixed 
and variable _ff2/HI ratio cases. Power et al. (2009) showed 
that the high redshift tail of the count distribution in the 
variable 7^2/111 case is substantially suppressed compared 
with the fixed H2/^l ratio case. 

We calculate the effective volume for possible SKA sur- 
vey configurations using the GpcBowOB model. The model 
gives the number density of galaxies at different redshifts 
brighter than the specified flux limit and the effective bias 
of these galaxies. The flux limit for a given collecting area 
and integration time is computed as described in Power et al. 
(2009). As explained above, we follow the procedure given 
by Power et al. (2009) to predict the flux of galaxies at 21cm 
and hence the number which can be detected with a given 
SKA set-up. Fig. 14 shows the predictions for the quantities 
required to calculate the effective volume for various SKA 
configurations. We calculate the effective volume for three 
different survey integration times and collecting areas as- 
suming a 200 deg^ field of view. The top panel of Fig. 14 
shows the number density of sources, n{z), as a function of 
redshift. The galaxy number density for a 3 year survey with 
a 1 km'^ collecting area is nearly constant up to 2 = 2; for an 
integration time of just 1 year with the same collecting area, 
the number density of galaxies detected declines rapidly be- 
yond 2 ~ 2. If the collecting area is much smaller, O.lkm^, 
a 1-yr survey for SKA probes 2 < 1. 

The middle panel of Fig. 14 shows that the bias in the 
three cases described above changes by a much more modest 
amount than the number density of galaxies does, increasing 
by 50% with redshift over the range plotted. The increase in 
effective bias cannot therefore compensate for the dramatic 
drop in the abundance of galaxies in the high redshift tails 
of the distributions. The effective volume of a survey config- 
uration no longer increases with redshift once the product of 
the galaxy number density and the galaxy power spectrum 
drops below unity. In this regime, the power spectrum signal 
is swamped by shot noise (Pshot = 1 /n) arising from the use 
of discrete galaxies to trace the continuous density field, and 
does not contribute to the statistical power of the survey. 
The product nP is plotted in the lower panel of Fig. 14. The 
different survey configurations track the geometrical volume 
available until the redshift at which nP < 1. This is clear 
from the lower panel of Fig. 15, in which the effective volume 
curves flatten once this redshift is reached. The thick lines 
show the effective volume expected for a fixed H2 /HI ratio. 
The thin lines show that the effective volume sampled drops 
by a factor of two when a variable H2 /HI ratio is adopted. 

We include in Fig. 14 two predictions for redshift sur- 
veys conducted in the near-infrared taken from Orsi et al. 
(2009), who followed the same procedure we have set out 
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above, but for different galaxy selection criteria. Tlie pre- 
dictions for a redshift survey to = 22 with a 33% red- 
shift sampling rate (green dot-long-dashed line) and for a 
slitless survey to an H-a flux limit of 5 x 10~^® erg s~^ 
cm^^, again with a 33% redshift measurement rate (groon 
dot-short-dashed line), are plotted for comparison. Leaving 
aside cost considerations and technical feasibility, this com- 
parison shows that a 1-year SKA survey is comparable to 
the one third sampling H = 22 survey, and samples around 
5 times the volume of the H-a survey. 

The effective volume gives a broad brush view of the 
potential performance of a survey. In order to get a more 
quantitative impression, we need to make a forecast of the 
error on the parameter of interest, which in our case is the 
dark energy equation of state parameter, w. This will allow 
us to assess if the volume sampled by the survey is at a 
redshift which is useful for constraining the value of w. The 
conclusions will depend to some extent on the dark energy 
model adopted. The fiducial model we use is a flat cold dark 
matter universe with a cosmologicaJ constant. The cosmo- 
logical constant has little impact on cosmological distances 
above z « 1.5-2. Hence, a difference in effective volume be- 
tween survey configurations at these rcdshifts is likely to 
have little impact on how well w can be measured. This be- 
haviour could change if we adopted a different dark energy 
model, such as one with appreciable amounts of dark en- 
ergy at early epochs (see, for example, the plots of Hubble 
parameter and luminosity distance in .Jennings et al. 2010). 

To make the forecast of the error on w for a particu- 
lar survey configuration, we use a Fisher matrix approach, 
closely following the calculation in Sco & Eisenstcin (2003) . 
Our goal is to compare the different survey configurations, 
so we use a number of approximations to simplify the calcu- 
lation. In particular, we work in the flat sky approximation, 
ignore the impact of redshift space distortions on the ap- 
pearance of the BAO and ignore any evolution of the power 
spectrum over bins of redshift of width 0.1. Under these 
assumptions, the Fisher matrix (for arbitrary parameters) 
obtained from the power spectrum is given by (Tegmark, 
Taylor & Heavens 1997; Seo & Eisenstein 2003), 

where R is the measured power spectrum divided by a 
smooth reference, as given by Eq. 7 and the effective vol- 
ume, VcB{k,z) is given by Eq. 4. The integration is over 
the wavenumber interval fcmin = 0.02/iMpc~^ to femax = 
Q.2h Mpc~^. To isolate the cosmological constraints which 
come from the BAO scale, we ignore any information stored 
in the amplitude of the power spectrum and assume the 
power sjjcctrum is sensitive to w only through the observed 
angular and radial distance scales. The explicit dependence, 
as given in Seo & Eisenstein (2003) is, 

f'obs (A:refj.,A:ref||,^) = D^Vh^z)!/ (^^) 

where fcrefj_ = k±DA{z)/DA{z)re! and 

fcrefy = k\\H{z)Tef/H{z) relate the wavenumbers inferred via 



an assumed cosmological model and the true physical scales 
in the power spectrum. 

Our calculation is idealised since we hold the values of 
the other cosmological parameters fixed. Angulo et al. (2008) 
showed that making such an assumption can have an impact 
on the size of the uncertainty inferred on w given an error 
on the BAO distance scale. Here we are interested in the 
relative error on w between different survey configurations, 
which we assume are robust to whether or not we vary other 
parameters. We find that the error on w, Aw, is fairly in- 
sensitive to the assumption about the ratio of i/2/HI. Even 
though the tails of the redshift distribution of HI emitters 
arc significantly different in these cases, and the effective 
survey volumes differ, this has little impact on Aw. Also, 
there is little difference in the accuracy achievable with 1 
year and 3 year surveys. Finally, we compare the perfor- 
mance of HI redshift surveys with that of surveys in the 
near infrared. The H = 22 survey is predicted to yield an 
equivalent Aw to the 1 year HI SKA survey. The H-alpha 
survey is expected to give a Aw that is twice as large as an 
H = 22 survey. 



5 SUMMARY AND CONCLUSIONS 

The cold gas content of galaxies and its variation with halo 
mass lie at the core of the galaxy formation process. The 
amount of cold gas in a galaxy is set by the balance be- 
tween a number of competing processes. The cold gas sup- 
ply comes from the cooling of gas from the hot halo and the 
accretion of cold gas following mergers with other galaxies. 
Star formation and supernova feedback act as sinks of cold 
gas. Semi-analytical simulations model all of these processes 
in the context of structure formation in the dark matter and 
so are ideally suited to make predictions for the distribution 
of cold gas in haloes of different mass. Since the models can 
make a wide range of predictions, their parameters arc set 
by the requirement that a variety of observed galaxy prop- 
erties be reproduced, not just the local HI data. The model 
predictions can be tested by measurements of the cluster- 
ing of Hl-selected galaxy samples, and are necessary to plan 
surveys to mcEisure the large-scale structure of the Universe 
with the next generation of radio telescopes. 

In this paper we have compared the predictions for 
the distribution of cold gas in dark matter haloes of four 
versions of the Durham semi-analytical galaxy formation 
model, GALFORM. The Bower et al. (2006) and Font et al. 
(2008) models are publicly available from the Millennium 
Archive. These models overpredict the local abundance of 
galaxies as a function of their cold gas mass. This excess was 
straightforward to fix, with the primary adjustment made to 
the model star formation timescale. This modified model, 
based on Bower et al. (2006) was still able to reproduce the 
quality of match to the optical luminosity functions enjoyed 
by Bower et al. We also considered a galaxy formation model 
set in a different cosmology, to take advantage of a N-body 
simulation with a large enough box size to accurately model 
baryonic acoustic oscillations. This model also adopted a 
modified star formation timescale to better match the local 
HI mass function. 

The model predictions have several features in common. 
In agreement with observations, satellite galaxies are rel- 
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atively unimportant in samples selected by their cold gas 
mass. This is true even in the Font et al. (2008) model in 
which satellites retain some of their hot haloes, depending on 
their orbit within the main halo, and can continue to accrete 
cooling gas. Samples constructed according to a cold gas 
mciss threshold are dominated by central gala^xies in haloes 
around 10^^ h~^MQ. The halo occuption distribution of cen- 
tral galaxies is peaked in haJo mass, rather than being a step 
function, as is the case for optical samples. As the cold gas 
mass cut is increased, the width of the central galaxy HOD 
increases and the amplitude drops. The peaked nature of the 
HOD of central galaxies is due to suppression of gas cool- 
ing in masses haloes following heating by AGN. We found 
the same general form for the HOD in a model by de Lucia 
& Blaizot 2007, in which the implementation of AGN/radio 
mode feedback is different from that in GALFORM. 

The relative importance of central and satellite galaxies 
has an impact on the form of the predicted correlation func- 
tion. The correlation function of a galaxy sample selected 
by cold gas mass is remarkably similar on small scales in 
real and rcdshift space. For pair separations in excess of a 
few Mpc, the redshift space correlation function has a higher 
amplitude than in real space, as expected given the effective 
bias of the sample (Kaiser 1987). In contrast, for an optically 
selected sample with the same number density of galaxies, 
the correlation steepens in real space for r < l/i~^Mpc and is 
damped in redshift space on these scales, due to the greater 
influence of satellite galaxies in massive haloes. On larger 
scales there is a more modest boost in the clustering ampli- 
tude in redshift space, due to the larger effective bias of the 
optical sample. The clustering predictions are in reasonable 
agreement with the measurements by Meyer et al. (2007). 
The clustering in the modified version of the Bower et al. 
model (MHIBowOe) best agrees with the HIPASS results. 

One of the primary science goals of the Square Kilome- 
tre Array (SKA) is to make a high precision measurement of 
large-scale structure in the galaxy distribution. By measur- 
ing the apparent size of baryonic acoustic oscillations (BAO) 
at a particular redshift, the cosmological distance to that 
redshift can be derived, thereby constraining the equation 
of state of the dark energy. By combining the galaxy for- 
mation model with a very large volume N-body simulation 
(l/i~^Gpc*), we have been able to demonstrate that galaxy 
samples constructed on the basis of cold gas mass can trace 
the BAO with the same fidelity as an near-infrared selected 
sample with the same number density of galaxies. 

The key remaining question is how effectively do HI 
and optical redshift surveys sample the available geomet- 
rical volume and how does this translate into an error on 
the dark energy equation of state parameter? The effective 
survey volume varies substantially between HI surveys of 
different duration and for different assumptions about the 
split between atomic and molecular hydrogen. However, at 
least for the case of a cosmological constant, these differ- 
ences occur in a redshift range which has little impact on 
the derived error on the equation of state. We find that HI 
surveys are comparable to the most ambitious near-infrared 
spectroscopic surveys currently under discussion, and will 
give a factor of two smaller error on w than a slitless H- 
Q redshift survey; all are bone fide Stage V experiments in 
the Dark Energy Task Force nomenclature (Albrecht et al. 
2006). The uncertainty in the ratio of molecular to atomic 



hydrogen is one of the major uncertainties at present, and 
leads to larger differences in the predicted counts of HI emit- 
ters than the choice of galaxy formation model. The frac- 
tion of molecular hydrogen is thought to depend upon the 
local conditions in the interstellar medium. This question 
requires further modelling (e.g. Krumholz, McKee & Tum- 
linson 2009), augmented by observations of the HI and CO 
distribution in nearby galaxies, for example by HI surveys 
on the SKA pathfinder MeerKAT and CO measurements 
using the Atacama Large Millimeter/submillimeter Array 
(Wootten 2008). 
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